For the final project, our group cleaned, explored, and analyzed four different data sets from the City of Berkeley that contained information on stops, calls for service, arrests, and jail bookings made by the the police department in 2016 (and 2015 as well for the stop data). Rather than limiting the project to one set of data, three sets were chosen in order to gain a more holistic and comprehensive understanding of the data. The additional variables included expanded the project’s capacity for manipulating data, examining relationships, and improving result reliability.
With the variety of resources and information made available by collating multiple data sets, the objective for the project was to study differences in police activity (in terms of call requests and patrols) and intensity of assessed offenses based on time, race, gender, age, and mental health. As a challenge, another project target was to create a map applet depicting the density of police activity in Berkeley with an interactive component allowing the visitor to input an address and observe their proximity and observe the types of incidences that occurred most commonly in the area.
For the stop data, there were 16,255 incidents assessed by the Berkeley police. In each case, the call date and time, location, incident type, and disposition(s) were recorded. The cases typically had a six character disposition, with each character conveying race, gender, age, reason, enforcement, and car search, respectively, for each subject involved in the incident, although there were additional dispositions, ranging from one to three characters, that could be input and conveyed other information. In order to prepare the data for exploration, the cleaning process included changing the call date and time to the lubridate format, transforming the addresses into longitude and latitude referencing Google Maps, and separating the information on dispositions into separate row entries for each individual assessed in a case and splitting it further by isolating “other” and six character dispositions into different columns.
The data for arrests (205), and jail bookings (223) contained similar information on case/arrest/booking number, date and time, type, and subject information (name, race, sex, D.O.B., age, height, weight, hair, eyes, and occupation) and statute information (type, description, agency, and disposition). Cleaning required the dates and times to be put into lubridate format, for the two data sets to be compiled in a reasonable, and for other needed adjustments. The same code created for the stop data would be used again to convert address information to longitudinal and latitudinal coordinates.
## [1] "" "00000" "AR" "AR, M" "AR, M, P"
## [6] "AR, P" "FC" "FC, M" "IN" "M"
## [11] "M, P" "MH" "MH, AR, P" "MH, M" "MH, P"
## [16] "P" "TOW" "TOW," "TOW, AR" "TOW, AR, M"
## [21] "TOW, AR, P" "TOW, CO, P" "TOW, FC" "TOW, IN" "TOW, IN, AR"
## [26] "TOW, M" "TOW, P"
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=37.865887,-122.276384&zoom=14&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
## OGR data source with driver: ESRI Shapefile
## Source: "Census_Tract_Polygons2010", layer: "Census_tracts_2010"
## with 33 features
## It has 12 fields
## OGR data source with driver: ESRI Shapefile
## Source: "Census_Tract_Polygons2010", layer: "Census_tracts_2010"
## with 33 features
## It has 12 fields
## Regions defined for each Polygons
We found the census data from the Berkeley open data website and created a headmap of the Berkeley population. Similarly, we created a heat map based on the stop data. Interestingly, we found that the most dense place (around north Berkeley) is a relatively safe place. The place where people are more likely to be stopped (downtown berkeley area) is less dense. Since Downtown Berkeley area is a transportation hub, numerous people come and go around this area. Though there are more people living north Berkeley, mainly residents will visit the area. Therefore, the most likely place to get stopped is not the most dense living place.
## Warning: Removed 1700 rows containing non-finite values (stat_density2d).
## Warning: Removed 933 rows containing non-finite values (stat_density2d).
Then we explored the stop data by age range (0-18; 18-29; 30-39; and 40+) and data. Similar to the all stop data density, the area that is most likely to be stopped is the same: Downtown Berkeley. The differences in age range and race don’t play a crucial role in the possibility of being stopped in Berkeley. These two analysis confirmed our explanations for the all stop data heat map.
## Warning: Removed 933 rows containing non-finite values (stat_density2d).
Note: 1 refers to age 0-18; 2 refers to age 18-29; 3 refers to age 30-39; and 4 refers to age 40+.
## Warning: Removed 933 rows containing non-finite values (stat_density2d).